{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Merchandise Popularity Prediction Challenge**  --- **`AMEYA PATIL`**\n",
    "\n",
    "Overview\n",
    "Big Brands spend a significant amount on popularizing a product. Nevertheless, their efforts go in vain while establishing the merchandise in the hyperlocal market. Based on different geographical conditions same attributes can communicate a piece of much different information about the customer. Hence, insights this is a must for any brand owner.\n",
    "\n",
    "In this competition, we have brought the data gathered from one of the top apparel brands in India. Provided the details concerning category, score, and presence in the store, participants are challenged to predict the popularity level of the merchandise. \n",
    "\n",
    "The popularity class decides how popular the product is given the attributes which a store owner can control to make it happen.\n",
    "\n",
    "**Dataset Description:**\n",
    "\n",
    "Train.csv - 18208 rows x 12 columns (Includes popularity Column as Target variable)\\\n",
    "Test.csv - 12140 rows x 11 columns\\\n",
    "Sample Submission.csv - Please check the Evaluation section for more details on how to generate a valid submission\\\n",
    " \n",
    "**Attributes:**\n",
    "* store_ratio\n",
    "* basket_ratio\n",
    "* category_1 \n",
    "* store_score\n",
    "* category_2\n",
    "* store_presence\n",
    "* score_1\n",
    "* score_2 \n",
    "* score_3\n",
    "* score_4\n",
    "* time\n",
    "* popularity - Class of popularity (Target Column)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-01-25T12:57:47.895262Z",
     "start_time": "2021-01-25T12:57:47.883261Z"
    }
   },
   "source": [
    "# Import Libraries and Name Submission File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:29:44.098314Z",
     "start_time": "2021-02-07T15:29:37.325757Z"
    }
   },
   "outputs": [],
   "source": [
    "# Import Necessary Libraries\n",
    "from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, VotingClassifier\n",
    "from catboost import CatBoostClassifier\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n",
    "from sklearn.model_selection import train_test_split\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "# Ignore Warnings \n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Random State\n",
    "seed = 42\n",
    "np.random.seed(seed)\n",
    "\n",
    "# Create Submission File -- Name for CSV File \n",
    "subfile_name = 'A_7feb_E03.csv'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-01-25T12:07:09.836815Z",
     "start_time": "2021-01-25T12:07:09.824805Z"
    }
   },
   "source": [
    "# Read & DataPreprocessing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:29:44.208023Z",
     "start_time": "2021-02-07T15:29:44.120258Z"
    }
   },
   "outputs": [],
   "source": [
    "train  = pd.read_csv('train.csv')\n",
    "test   = pd.read_csv('test.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:29:44.311263Z",
     "start_time": "2021-02-07T15:29:44.285297Z"
    }
   },
   "outputs": [],
   "source": [
    "train  = train.drop_duplicates().reset_index(drop=True) # Drop Duplicates "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:29:44.393010Z",
     "start_time": "2021-02-07T15:29:44.386029Z"
    }
   },
   "outputs": [],
   "source": [
    "X = train.drop(['popularity'], axis = 1)\n",
    "y = train['popularity']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## One Hot Encode and Rescale"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:29:44.577674Z",
     "start_time": "2021-02-07T15:29:44.541613Z"
    }
   },
   "outputs": [],
   "source": [
    "#cols = ['Category_1','Category_2'] # Without one hot encoding Category_1 score improved\n",
    "cols = ['Category_2']\n",
    "X_scaled = pd.get_dummies(X, columns=cols)\n",
    "test_scaled = pd.get_dummies(test , columns=cols)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:29:44.943637Z",
     "start_time": "2021-02-07T15:29:44.876618Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(15285, 12)"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "(12140, 12)"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# SCALING\n",
    "scaler = MinMaxScaler()\n",
    "X_scaled = scaler.fit_transform(X_scaled)\n",
    "test_scaled = scaler.transform(test_scaled)\n",
    "\n",
    "# Display\n",
    "display(X_scaled.shape, test_scaled.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Voting Classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:35:39.599061Z",
     "start_time": "2021-02-07T15:29:45.190931Z"
    },
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0:\tlearn: 1.5825669\ttotal: 267ms\tremaining: 4m 26s\n",
      "400:\tlearn: 0.4635001\ttotal: 30.9s\tremaining: 46.1s\n",
      "800:\tlearn: 0.4377283\ttotal: 1m 1s\tremaining: 15.2s\n",
      "999:\tlearn: 0.4295014\ttotal: 1m 16s\tremaining: 0us\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.\n",
      "[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    5.6s\n",
      "[Parallel(n_jobs=-1)]: Done 536 tasks      | elapsed:    9.0s\n",
      "[Parallel(n_jobs=-1)]: Done 1536 tasks      | elapsed:   15.7s\n",
      "[Parallel(n_jobs=-1)]: Done 2936 tasks      | elapsed:   24.6s\n",
      "[Parallel(n_jobs=-1)]: Done 4736 tasks      | elapsed:   36.5s\n",
      "[Parallel(n_jobs=-1)]: Done 6000 out of 6000 | elapsed:   45.9s finished\n",
      "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.\n",
      "[Parallel(n_jobs=-1)]: Done  52 tasks      | elapsed:    0.9s\n",
      "[Parallel(n_jobs=-1)]: Done 352 tasks      | elapsed:    5.4s\n",
      "[Parallel(n_jobs=-1)]: Done 852 tasks      | elapsed:   12.9s\n",
      "[Parallel(n_jobs=-1)]: Done 1552 tasks      | elapsed:   23.0s\n",
      "[Parallel(n_jobs=-1)]: Done 2452 tasks      | elapsed:   36.7s\n",
      "[Parallel(n_jobs=-1)]: Done 3552 tasks      | elapsed:   53.2s\n",
      "[Parallel(n_jobs=-1)]: Done 4852 tasks      | elapsed:  1.2min\n",
      "[Parallel(n_jobs=-1)]: Done 6352 tasks      | elapsed:  1.5min\n",
      "[Parallel(n_jobs=-1)]: Done 8052 tasks      | elapsed:  1.9min\n",
      "[Parallel(n_jobs=-1)]: Done 8985 out of 9000 | elapsed:  2.1min remaining:    0.1s\n",
      "[Parallel(n_jobs=-1)]: Done 9000 out of 9000 | elapsed:  2.1min finished\n",
      "[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.\n",
      "[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.4s\n",
      "[Parallel(n_jobs=8)]: Done 184 tasks      | elapsed:    1.4s\n",
      "[Parallel(n_jobs=8)]: Done 434 tasks      | elapsed:    2.8s\n",
      "[Parallel(n_jobs=8)]: Done 784 tasks      | elapsed:    5.4s\n",
      "[Parallel(n_jobs=8)]: Done 1234 tasks      | elapsed:    8.9s\n",
      "[Parallel(n_jobs=8)]: Done 1784 tasks      | elapsed:   11.3s\n",
      "[Parallel(n_jobs=8)]: Done 2434 tasks      | elapsed:   13.7s\n",
      "[Parallel(n_jobs=8)]: Done 3184 tasks      | elapsed:   16.8s\n",
      "[Parallel(n_jobs=8)]: Done 4034 tasks      | elapsed:   21.3s\n",
      "[Parallel(n_jobs=8)]: Done 4984 tasks      | elapsed:   27.2s\n",
      "[Parallel(n_jobs=8)]: Done 6000 out of 6000 | elapsed:   31.6s finished\n",
      "[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.\n",
      "[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.2s\n",
      "[Parallel(n_jobs=8)]: Done 184 tasks      | elapsed:    1.2s\n",
      "[Parallel(n_jobs=8)]: Done 434 tasks      | elapsed:    2.9s\n",
      "[Parallel(n_jobs=8)]: Done 784 tasks      | elapsed:    5.6s\n",
      "[Parallel(n_jobs=8)]: Done 1234 tasks      | elapsed:    8.3s\n",
      "[Parallel(n_jobs=8)]: Done 1784 tasks      | elapsed:   12.0s\n",
      "[Parallel(n_jobs=8)]: Done 2434 tasks      | elapsed:   16.1s\n",
      "[Parallel(n_jobs=8)]: Done 3184 tasks      | elapsed:   20.7s\n",
      "[Parallel(n_jobs=8)]: Done 4034 tasks      | elapsed:   26.6s\n",
      "[Parallel(n_jobs=8)]: Done 4984 tasks      | elapsed:   32.6s\n",
      "[Parallel(n_jobs=8)]: Done 6034 tasks      | elapsed:   38.1s\n",
      "[Parallel(n_jobs=8)]: Done 7184 tasks      | elapsed:   44.6s\n",
      "[Parallel(n_jobs=8)]: Done 8434 tasks      | elapsed:   51.2s\n",
      "[Parallel(n_jobs=8)]: Done 9000 out of 9000 | elapsed:   55.0s finished\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.002403</td>\n",
       "      <td>0.064266</td>\n",
       "      <td>0.598503</td>\n",
       "      <td>0.261213</td>\n",
       "      <td>0.073615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.000045</td>\n",
       "      <td>0.005574</td>\n",
       "      <td>0.031738</td>\n",
       "      <td>0.955256</td>\n",
       "      <td>0.007386</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.000065</td>\n",
       "      <td>0.004153</td>\n",
       "      <td>0.055776</td>\n",
       "      <td>0.925023</td>\n",
       "      <td>0.014984</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.000066</td>\n",
       "      <td>0.005355</td>\n",
       "      <td>0.046603</td>\n",
       "      <td>0.933864</td>\n",
       "      <td>0.014112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.000074</td>\n",
       "      <td>0.001058</td>\n",
       "      <td>0.008534</td>\n",
       "      <td>0.986190</td>\n",
       "      <td>0.004144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12135</th>\n",
       "      <td>0.000246</td>\n",
       "      <td>0.029402</td>\n",
       "      <td>0.155123</td>\n",
       "      <td>0.784889</td>\n",
       "      <td>0.030340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12136</th>\n",
       "      <td>0.000049</td>\n",
       "      <td>0.000910</td>\n",
       "      <td>0.009918</td>\n",
       "      <td>0.986506</td>\n",
       "      <td>0.002616</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12137</th>\n",
       "      <td>0.000048</td>\n",
       "      <td>0.002572</td>\n",
       "      <td>0.020809</td>\n",
       "      <td>0.973723</td>\n",
       "      <td>0.002849</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12138</th>\n",
       "      <td>0.000058</td>\n",
       "      <td>0.003624</td>\n",
       "      <td>0.014762</td>\n",
       "      <td>0.978950</td>\n",
       "      <td>0.002607</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12139</th>\n",
       "      <td>0.000437</td>\n",
       "      <td>0.061141</td>\n",
       "      <td>0.271522</td>\n",
       "      <td>0.630588</td>\n",
       "      <td>0.036313</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>12140 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "              0         1         2         3         4\n",
       "0      0.002403  0.064266  0.598503  0.261213  0.073615\n",
       "1      0.000045  0.005574  0.031738  0.955256  0.007386\n",
       "2      0.000065  0.004153  0.055776  0.925023  0.014984\n",
       "3      0.000066  0.005355  0.046603  0.933864  0.014112\n",
       "4      0.000074  0.001058  0.008534  0.986190  0.004144\n",
       "...         ...       ...       ...       ...       ...\n",
       "12135  0.000246  0.029402  0.155123  0.784889  0.030340\n",
       "12136  0.000049  0.000910  0.009918  0.986506  0.002616\n",
       "12137  0.000048  0.002572  0.020809  0.973723  0.002849\n",
       "12138  0.000058  0.003624  0.014762  0.978950  0.002607\n",
       "12139  0.000437  0.061141  0.271522  0.630588  0.036313\n",
       "\n",
       "[12140 rows x 5 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 5min 54s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "\n",
    "# Instantiate CatBOOST\n",
    "model_cat = CatBoostClassifier(metric_period=400,\n",
    "                          random_state=seed,\n",
    "                          learning_rate=0.01,\n",
    "                          loss_function = 'MultiClass',\n",
    "                          border_count=1500)   # Catboost\n",
    "\n",
    "# Instantiate ExtraTreeClassifer\n",
    "model_etr=ExtraTreesClassifier(\n",
    "    n_estimators = 6000, max_depth = None, n_jobs = -1, random_state = seed, verbose = 1,bootstrap=True)     # ExtraTreesClassifier\n",
    "\n",
    "# Instantiate RandomForestClassifier\n",
    "model_rf=RandomForestClassifier(\n",
    "    n_estimators = 9000,  max_depth = None, n_jobs = -1, random_state = seed, verbose = 1)    # RandomForestClassifier\n",
    "\n",
    "\n",
    "# Combine all models(estimators) and use 'soft voting' -- Voting Classifier\n",
    "vote=VotingClassifier(estimators = [(\n",
    "    'CatBoost', model_cat), ('ETR', model_etr), ('RF', model_rf)], voting = 'soft', weights = [8, 6, 8])\n",
    "\n",
    "\n",
    "\n",
    "# Fit the Model\n",
    "vote.fit(X_scaled, y)\n",
    "\n",
    "# Make Predictions and convert Predictions to Dataframe\n",
    "pred = pd.DataFrame(vote.predict_proba(test_scaled))\n",
    "display(pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-01-25T14:12:05.307784Z",
     "start_time": "2021-01-25T14:11:55.533Z"
    }
   },
   "source": [
    "# Prediction Adjuster \n",
    "\n",
    "Steps:-\n",
    "1. Import 'common_grounds.csv' file \n",
    "2. Loop through each predictions \n",
    "3. Ensure predictions are adjusted \n",
    "4. create final submission file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:35:48.661004Z",
     "start_time": "2021-02-07T15:35:40.381016Z"
    }
   },
   "outputs": [],
   "source": [
    "# Adjusting Predictions -- Using previously created common_grounds file\n",
    "\n",
    "# Read file \n",
    "cg = pd.read_csv('common_grounds.csv', index_col=0)\n",
    "cg = cg.replace({3:2,4:3,5:4}) # Align indexes as per prediction file\n",
    "#display(cg.head(10))\n",
    "\n",
    "# To loop through each predictions row-wise\n",
    "for i in cg.index:\n",
    "    pos = cg.loc[i]\n",
    "    for n in [0,1,2,3,4]: # (0,1,3,4,5) and (0,1,2,3,4) gives same prediction and creates proper submission file \n",
    "        pred.loc[i,n] = 0\n",
    "    pred.loc[i,pos] = 1 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Submission File Creation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-02-07T15:35:49.393216Z",
     "start_time": "2021-02-07T15:35:48.982631Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Submission File A_7feb_E03.csv generated and saved as CSV\n"
     ]
    }
   ],
   "source": [
    "# Create Submission file \n",
    "\n",
    "pred.to_csv(subfile_name,index=False)\n",
    "print(f'Submission File {subfile_name} generated and saved as CSV')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<span style=\"font-family: Arial; font-weight:bold;font-size:1.9em;color:#f97102;\"> -------------------------End of File---------------------------------"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {
    "height": "167px",
    "width": "236px"
   },
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "307.2px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
